selection matrix
- North America > United States > Michigan (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > Promising Solution (0.66)
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > Texas > Harris County > Houston (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California (0.14)
- Asia > China > Shanghai > Shanghai (0.05)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
- Information Technology > Artificial Intelligence > Vision (0.71)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.47)
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > Texas > Harris County > Houston (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Learning Evolution via Optimization Knowledge Adaptation
Wang, Chao, Jiao, Licheng, Zhao, Jiaxuan, Li, Lingling, Liu, Fang, Yang, Shuyuan
Evolutionary algorithms (EAs) maintain populations through evolutionary operators to discover diverse solutions for complex tasks while gathering valuable knowledge, such as historical population data and fitness evaluations. However, traditional EAs face challenges in dynamically adapting to expanding knowledge bases, hindering the efficient exploitation of accumulated information and limiting adaptability to new situations. To address these issues, we introduce an Optimization Knowledge Adaptation Evolutionary Model (OKAEM), which features dynamic parameter adjustment using accumulated knowledge to enhance its optimization capabilities. OKAEM employs attention mechanisms to model the interactions among individuals, fitness landscapes, and genetic components separately, thereby parameterizing the evolutionary operators of selection, crossover, and mutation. These powerful learnable operators enable OKAEM to benefit from pre-learned extensive prior knowledge and self-tune with real-time evolutionary insights. Experimental results demonstrate that OKAEM: 1) exploits prior knowledge for significant performance gains across various knowledge transfer settings; 2) achieves competitive performance through self-tuning alone, even without prior knowledge; 3) outperforms state-of-the-art black-box baselines in a vision-language model tuning case; 4) can improve its optimization capabilities with growing knowledge; 5) is capable of emulating principles of natural selection and genetic recombination.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- (3 more...)
DISP-LLM: Dimension-Independent Structural Pruning for Large Language Models
Gao, Shangqian, Lin, Chi-Heng, Hua, Ting, Zheng, Tang, Shen, Yilin, Jin, Hongxia, Hsu, Yen-Chang
Large Language Models (LLMs) have achieved remarkable success in various natural language processing tasks, including language modeling, understanding, and generation. However, the increased memory and computational costs associated with these models pose significant challenges for deployment on resource-limited devices. Structural pruning has emerged as a promising solution to reduce the costs of LLMs without requiring post-processing steps. Prior structural pruning methods either follow the dependence of structures at the cost of limiting flexibility, or introduce non-trivial additional parameters by incorporating different projection matrices. In this work, we propose a novel approach that relaxes the constraint imposed by regular structural pruning methods and eliminates the structural dependence along the embedding dimension. Our dimension-independent structural pruning method offers several benefits. Firstly, our method enables different blocks to utilize different subsets of the feature maps. Secondly, by removing structural dependence, we facilitate each block to possess varying widths along its input and output dimensions, thereby significantly enhancing the flexibility of structural pruning. We evaluate our method on various LLMs, including OPT, LLaMA, LLaMA-2, Phi-1.5, and Phi-2. Experimental results demonstrate that our approach outperforms other state-of-the-art methods, showing for the first time that structural pruning can achieve an accuracy similar to semi-structural pruning.
- North America > United States > Michigan (0.04)
- Europe > United Kingdom > North Sea (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
MoDeGPT: Modular Decomposition for Large Language Model Compression
Lin, Chi-Heng, Gao, Shangqian, Smith, James Seale, Patel, Abhishek, Tuli, Shikhar, Shen, Yilin, Jin, Hongxia, Hsu, Yen-Chang
Large Language Models (LLMs) have reshaped the landscape of artificial intelligence by demonstrating exceptional performance across various tasks. However, substantial computational requirements make their deployment challenging on devices with limited resources. Recently, compression methods using low-rank matrix techniques have shown promise, yet these often lead to degraded accuracy or introduce significant overhead in parameters and inference latency. This paper introduces \textbf{Mo}dular \textbf{De}composition (MoDeGPT), a novel structured compression framework that does not need recovery fine-tuning while resolving the above drawbacks. MoDeGPT partitions the Transformer block into modules comprised of matrix pairs and reduces the hidden dimensions via reconstructing the module-level outputs. MoDeGPT is developed based on a theoretical framework that utilizes three well-established matrix decomposition algorithms -- Nystr\"om approximation, CR decomposition, and SVD -- and applies them to our redefined transformer modules. Our comprehensive experiments show MoDeGPT, without backward propagation, matches or surpasses previous structured compression methods that rely on gradient information, and saves 98% of compute costs on compressing a 13B model. On \textsc{Llama}-2/3 and OPT models, MoDeGPT maintains 90-95% zero-shot performance with 25-30% compression rates. Moreover, the compression can be done on a single GPU within a few hours and increases the inference throughput by up to 46%.
Optimal Text-Based Time-Series Indices
This integration is typically done by (i) selecting, (ii) transforming, and (iii) aggregating textual content into a time-series representation (see Ardia et al., 2019; Algaba et al., 2020, for a general overview of these steps). While many studies have focused on steps (ii) and (iii)-- transforming and aggregating textual data into a quantitative measure such as sentiment (see e.g., Loughran and McDonald, 2014; Jegadeesh and Wu, 2013; Manela and Moreira, 2017)--the essential selection step (i), which usually relies on subjective ad-hoc rules, has not received much attention yet. We aim to fill this gap in this article by proposing an approach to construct text-based time-series indices optimally. Specifically, our algorithm determines which set of texts, among a large corpus, leads to a text-based index that is optimal for a specific objective--typically, an index that maximizes the contemporaneous relation or the predictive performance with respect to a target variable, such as inflation. Our methodology relies on binary selection matrices that, applied to the vocabulary of tokens, select the relevant texts in the corpus.
- North America > United States > Michigan (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Quebec > Estrie Region > Sherbrooke (0.04)
- Government (1.00)
- Banking & Finance > Economy (1.00)
- Media > News (0.68)
Misalignment, Learning, and Ranking: Harnessing Users Limited Attention
Agarwal, Arpit, Niazadeh, Rad, Patil, Prathamesh
In digital health and EdTech, recommendation systems face a significant challenge: users often choose impulsively, in ways that conflict with the platform's long-term payoffs. This misalignment makes it difficult to effectively learn to rank items, as it may hinder exploration of items with greater long-term payoffs. Our paper tackles this issue by utilizing users' limited attention spans. We propose a model where a platform presents items with unknown payoffs to the platform in a ranked list to $T$ users over time. Each user selects an item by first considering a prefix window of these ranked items and then picking the highest preferred item in that window (and the platform observes its payoff for this item). We study the design of online bandit algorithms that obtain vanishing regret against hindsight optimal benchmarks. We first consider adversarial window sizes and stochastic iid payoffs. We design an active-elimination-based algorithm that achieves an optimal instance-dependent regret bound of $O(\log(T))$, by showing matching regret upper and lower bounds. The key idea is using the combinatorial structure of the problem to either obtain a large payoff from each item or to explore by getting a sample from that item. This method systematically narrows down the item choices to enhance learning efficiency and payoff. Second, we consider adversarial payoffs and stochastic iid window sizes. We start from the full-information problem of finding the permutation that maximizes the expected payoff. By a novel combinatorial argument, we characterize the polytope of admissible item selection probabilities by a permutation and show it has a polynomial-size representation. Using this representation, we show how standard algorithms for adversarial online linear optimization in the space of admissible probabilities can be used to obtain a polynomial-time algorithm with $O(\sqrt{T})$ regret.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
- Health & Medicine (0.48)
- Education (0.47)